Goto

Collaborating Authors

 single node


DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Neural Information Processing Systems

Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves > 5000 queries a second with < 3ms mean latency and 95%+ 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC+G+P plateau at around 50% 1-recall@1. Alternately, in the high recall regime, DiskANN can index and serve 5 10x more points per node compared to state-of-the-art graph-based methods such as HNSW and NSG. Finally, as part of our overall DiskANN system, we introduce Vamana, a new graph-based ANNS index that is more versatile than the graph indices even for in-memory indices.


Comparing energy consumption and accuracy in text classification inference

arXiv.org Artificial Intelligence

The increasing deployment of large language models (LLMs) in natural language processing (NLP) tasks raises concerns about energy efficiency and sustainability. While prior research has largely focused on energy consumption during model training, the inference phase has received comparatively less attention. This study systematically evaluates the trade-offs between model accuracy and energy consumption in text classification inference across various model architectures and hardware configurations. Our empirical analysis shows that the best-performing model in terms of accuracy can also be energy-efficient, while larger LLMs tend to consume significantly more energy with lower classification accuracy. We observe substantial variability in inference energy consumption ($<$mWh to $>$kWh), influenced by model type, model size, and hardware specifications. Additionally, we find a strong correlation between inference energy consumption and model runtime, indicating that execution time can serve as a practical proxy for energy usage in settings where direct measurement is not feasible. These findings have implications for sustainable AI development, providing actionable insights for researchers, industry practitioners, and policymakers seeking to balance performance and resource efficiency in NLP applications.


NNTile: a machine learning framework capable of training extremely large GPT language models on a single node

arXiv.org Artificial Intelligence

This study presents an NNTile framework for training large deep neural networks in heterogeneous clusters. The NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units (CPUs and GPUs). It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices, depending on automatic scheduling decisions. Such an approach shifts the burden of deciding where to compute and when to communicate from a human being to an automatic decision maker, whether a simple greedy heuristic or a complex AI-based software. The performance of the presented tool for training large language models is demonstrated in extensive numerical experiments.


The Built-In Robustness of Decentralized Federated Averaging to Bad Data

arXiv.org Artificial Intelligence

Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller. In this setting, local data remains private, but its quality and quantity can vary significantly across nodes. The extent to which a fully decentralized system is vulnerable to poor-quality or corrupted data remains unclear, but several factors could contribute to potential risks. Without a central authority, there can be no unified mechanism to detect or correct errors, and each node operates with a localized view of the data distribution, making it difficult for the node to assess whether its perspective aligns with the true distribution. Moreover, models trained on low-quality data can propagate through the network, amplifying errors. To explore the impact of low-quality data on DFL, we simulate two scenarios with degraded data quality -- one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node -- using a decentralized implementation of FedAvg. Our results reveal that averaging-based decentralized learning is remarkably robust to localized bad data, even when the corrupted data resides in the most influential nodes of the network. Counterintuitively, this robustness is further enhanced when the corrupted data is concentrated on a single node, regardless of its centrality in the communication network topology. This phenomenon is explained by the averaging process, which ensures that no single node -- however central -- can disproportionately influence the overall learning process.


Reviews: DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Neural Information Processing Systems

The writing could be improved, but it's in general understandable. However, citation quality can be improved. In particular, it seems to me that NSG and HNSW are actually using the same pruning rule (which results in approximate relative neighborhood graph). I really like your updated version, which reduces the number hops (and I haven't seen this pruning variant before)! Detailed comments: Abstract and further: base points sounds like a strange term, do you mean domain points? Please, find a more specific-generic citation that describes this phenomena.


Reviews: DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Neural Information Processing Systems

In post rebuttal discussions, reviewers concurred in subsequent discussions that the paper presents solid state of art implementation and very impressive results, which will have good impact for practitioners. This significant impact by itself was worthy of publication.


DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Neural Information Processing Systems

Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves 5000 queries a second with 3ms mean latency and 95% 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC G P plateau at around 50% 1-recall@1.


Age of Gossip on Generalized Rings

arXiv.org Artificial Intelligence

We consider a gossip network consisting of a source forwarding updates and $n$ nodes placed geometrically in a ring formation. Each node gossips with $f(n)$ nodes on either side, thus communicating with $2f(n)$ nodes in total. $f(n)$ is a sub-linear, non-decreasing and positive function. The source keeps updates of a process, that might be generated or observed, and shares them with the nodes in the ring network. The nodes in the ring network communicate with their neighbors and disseminate these version updates using a push-style gossip strategy. We use the version age metric to quantify the timeliness of information at the nodes. Prior to this work, it was shown that the version age scales as $O(n^{\frac{1}{2}})$ in a ring network, i.e., when $f(n)=1$, and as $O(\log{n})$ in a fully-connected network, i.e., when $2f(n)=n-1$. In this paper, we find an upper bound for the average version age for a set of nodes in such a network in terms of the number of nodes $n$ and the number of gossiped neighbors $2 f(n)$. We show that if $f(n) = \Omega(\frac{n}{\log^2{n}})$, then the version age still scales as $\theta(\log{n})$. We also show that if $f(n)$ is a rational function, then the version age also scales as a rational function. In particular, if $f(n)=n^\alpha$, then version age is $O(n^\frac{1-\alpha}{2})$. Finally, through numerical calculations we verify that, for all practical purposes, if $f(n) = \Omega(n^{0.6})$, the version age scales as $O(\log{n})$.


Introduction to Artificial Neural Networks

#artificialintelligence

Artificial neural networks provide to map input data to output data through layers. We should talk about layers representation of data when we talk about neural networks. These layers are input layer, hidden layer(s) and output layer. As you can see in the following figure, input data accepts the raw data which we want to do operations like classification or regression etc. For example if you want to classify pictures you should give every pixels of it.


Generate Yu-Gi-Oh! card images using Generative Adversarial Network

#artificialintelligence

Generative Adversarial Networks are a deep learning architecture for generating new samples that could have come from an existing distributions of existing samples. GAN are comprised of both generator and discriminator models. The generator is responsible for generating new samples from the latent space, and the discriminator is responsible for classifying whether samples are real or fake. First, I had to gather the Yu-Gi-Oh! The API was used to download all Yugioh cards (10.000 cards).